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Abstract 

In this paper, the capacity and energy efficiency of training-based communication schemes employed for 
transmission over a-priori unknown Rayleigh block fading channels are studied. In these schemes, periodically 
transmitted training symbols are used at the receiver to obtain the minimum mean-square-error (MMSE) estimate 
of the channel fading coefficients. Initially, the case in which the product of the estimate error and transmitted 
signal is assumed to be Gaussian noise is considered. In this case, it is shown that bit energy requirements grow 
without bound as the signal-to-noise ratio (SNR) goes to zero, and the minimum bit energy is achieved at a 
nonzero SNR value below which one should not operate. The effect of the block length on both the minimum bit 
energy and the SNR value at which the minimum is achieved is investigated. Energy efficiency analysis is also 
carried out when peak power constraints are imposed on pilot signals. Flash training and transmission schemes 
are analyzed and shown to improve the energy efficiency in the low-SNR regime. 

In the second part of the paper, the capacity and energy efficiency of training-based schemes are investigated 
when the channel input is subject to peak power constraints. The capacity-achieving input structure is characterized 
and the magnitude distribution of the optimal input is shown to be discrete with a finite number of mass points. 
The capacity, bit energy requirements, and optimal resource allocation strategies are obtained through numerical 
analysis. The bit energy is again shown to grow without bound as SNR decreases to zero due to the presence of 
peakedness constraints. Capacity and energy-per-bit are also analyzed under the assumptions that the transmitter 
interleaves the data symbols before transmission over the channel, and per-symbol peak power constraints are 
imposed. The improvements in energy efficiency when on-off keying with fixed peak power and vanishing duty 
cycle is employed are studied. Comparisons of the performances of training-based and noncoherent transmission 
schemes are provided. 

Index Terms: Channel capacity, energy-per-bit, energy efficiency, training-based transmission, capacity-achieving 
input distribution, optimal resource allocation, Rayleigh block fading channels, channel estimation. 
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I. Introduction 

In wireless communications, channel conditions vary randomly over time due to mobility and changing 
environment, and the degree of channel side information (CSI) assumed to be available at the receiver and 
transmitter is a key assumption in the study of wireless fading channels. The case in which the channel is 
assumed to be perfectly known at the receiver and/or transmitter has been extensively studied. In an early work, 
Ericsson [1] obtained the capacity of flat fading channels with perfect receiver CSI. More recently, Ozarow et al. 
[2] studied the average and outage capacity values in the cellular mobile radio setting assuming perfect channel 
knowledge at the receiver. Goldsmith and Varaiya [3] analyzed the capacity of flat fading channels with perfect 
CSI at the transmitter and/or receiver. 

The assumption of having perfect channel knowledge is unwarranted when communication is trying to be 
established in a highly mobile environment. This consideration has led to another hne of work where both the 
receiver and transmitter are assumed to be completely uninformed of the channel conditions. Abou-Faycal et al. 
[4] studied the capacity of the unknown Rayleigh fading channel and showed that the optimal input amphtude has 
a discrete structure. This is in stark contrast to the optimaUty of a continuous Gaussian input in known channels. 
In [16] and [18], the discreteness of the capacity-achieving amphtude distribution is proven for noncoherent 
Rician fading channels under input peakedness constraints. When the input is subject to peak power constraints, 
the discrete nature of the optimal input is shown for a general class of single-input single-output channels 
in [7]. Marzetta and Hochwald [5] gave a characterization of the optimal input structure for unknown multiple- 
antenna Rayleigh fading channels. This analysis subsequently led to the proposal of unitary space-time modulation 
techniques [6]. Chan et al [8] considered conditionally Gaussian multiple-input multiple-output (MIMO) channels 
with bounded inputs and proved the discreteness of the optimal input under certain conditions. Zheng and Tse 
[10] analyzed the multiple-antenna Rayleigh channels and identified the high signal-to-noise ratio (SNR) behavior 
of the channel capacity. 

Heretofore, the two extreme assumptions of having either perfect CSI or no CSI have been discussed. Practical 
wireless systems live in between these two extremes. Unless there is very high mobility, wireless systems generally 
employ estimation techniques to learn the channel conditions, albeit with errors. Hence, it is of utmost interest 
to analyze fading channels with imperfect CSI. Medard [13] investigated the effect upon channel capacity of 
imperfect channel knowledge and obtained upper and lower bounds on the input-output mutual information. 
Lapidoth and Shamai [12] analyzed the effects of channel estimation errors on the performance if Gaussian 
codebooks are used and nearest neighbor decoding is employed. The capacity of imperfectly-known fading 
channels is characterized in the low-SNR regime in [14] and in the high-SNR regime in [9]. 

The aforementioned studies have not considered exphcit training and estimation techniques, and resources 
allocated to them. Recently, Hassibi and Hochwald [23] studied training schemes to learn the multiple-antenna 
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channels. In this work, power and time dedicated to training is optimized by maximizing a lower bound on the 
capacity. Similar training techniques are also discussed in [10]. Due to its practical significance, the information- 
theoretic analysis of training schemes has attracted much interest (see e.g., [24]-[35]). Since exact capacity 
expressions are difficult to find, these studies have optimized the training signal power, duration, and placement 
using capacity bounds. Since Gaussian noise is the worst-case uncorrelated additive noise in a Gaussian setting 
[23], a capacity lower bound is generally obtained by assuming the product of the estimate error and the transmitted 
signal as another source of Gaussian noise. In the above cited work, training symbols are employed to solely 
faciUtate chaimel estimation. However, we note that training symbols can also be used for timing- and frequency- 
offset synchronization, and channel equalization [36]-[38]. Tong et al. in [22] present an overview of pilot- 
assisted wireless transmissions and discuss design issues from both information-theoretic and signal processing 
perspectives. 

Another important concern in wireless commuiucations is the efficient use of hmited energy resources. In 
systems where energy is at a premium, minimizing the energy cost per unit transmitted information will improve 
the efficiency. Hence, the energy required to reUably send one bit is a metric that can be adopted to measure the 
performance. Generally, energy-per-bit requirement is minimized, and hence the energy efficiency is maximized, 
if the system operates in the low-SNR regime. In [14], Verdu has analyzed the tradeoff between the spectral 
efficiency and bit energy in the low-SNR regime for a general class of chaimels and shown that the normalized 
received minimum bit energy of —1.59 dB is achieved as SNR ^ in averaged power limits channels regardless 
of the availability of CSI at the receiver. On the other hand, [14] has proven that if the receiver has imperfect 
CSI, the wideband slope, which is the slope of the spectral efficiency curve at zero spectral efficiency, is zero. 
Hence, approaching the minimum bit energy of —1.59 dB is extremely slow, and moreover it requires input 
signals with increasingly higher peak-to-average power ratios. The impact upon the energy efficiency of limiting 
the peakedness of signals is analyzed in [17]. The wideband chaimel capacity in the presence of input peakedness 
constraints is investigated in [15], [19], and [20]. 

Energy efficiency, which is of paramount importance in many wireless systems, has not been the core focus 
of the aforementioned work on training schemes. Moreover, previous studies optimized the training parameters 
by using capacity lower bounds. These achievable rate expressions are relevant for systems in which the channel 
estimate is assumed to be perfect and transmission and reception is designed for a known channel. Note that 
these assumptions will lead to poor performance unless the SNR is high or the channel coherence time is long. 

The contributions of this paper are the following: 

• We provide an energy efficiency perspective by analyzing the performance of training techniques in the low- 
SNR regime. Note that at low SNR levels, the quaUty of the chaimel estimate is far from being perfect. We 
quantify the performance losses in terms of energy efficiency in the worst-case scenario where the estimate 
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is assumed to be perfect. We identify an SNR level below which one should avoid operating. We consider 
flash training and transmission techniques to improve the performance. 

• We obtain the exact capacity of training-based schemes by characterizing the structure of the capacity- 
achieving input distribution under input peak power constraints which are highly relevant in practical 
applications. Optimal resource allocation is performed using the exact capacity values. Improvements in 
energy efficiency with respect to the worst-case scenario are shown. 

• We compare the performances of untrained noncoherent and training-based communication schemes under 
peak power constraints and show through numerical results that performance loss experienced by training- 
based schemes is small even at low SNR levels and small values of coherence time. On the other hand, 
if data symbols are interleaved and experience independent fading, we show that training-based schemes 
outperform noncoherent techniques. 

• We find the attainable bit energy levels in the low-SNR regime when limitations on the peak-to-average 
power ratio are relaxed and on-off keying with fixed power and vanishing duty cycle is used to transmit 
information. 

The organization of the paper is as follows. Section |lI]provides the channel model. In Section |lIIJ training-based 
transmission and reception is described. In Section |IVj we study the achievable rates and energy efficiency in 
the case where the product of the channel estimate and the transmitted signal is assumed to be Gaussian noise. 
In Section |Vl we analyze the capacity and the energy efficiency of training-based schemes when the input is 
subject to peak power limitations. Section IVll includes our conclusions. Proofs of several results are relegated to 
the Appendix. 

II. Channel Model 

We consider Rayleigh block-fading channels where the input-output relationship within a block of m symbols 
is given by 

y = /ix + n (1) 

where h ~ CAA(0,7^) is a zero-mean circularly symmetric complex Gaussian random variable with variance 
= 7^, and n is a zero-mean, m complex-dimensional Gaussian random vectoj^ with covariance matrix 
E{nn^} = NqI. x and y are the m complex-dimensional channel input and output vectors, respectively. It is 
assumed that the fading coefficients stay constant for a block of m symbols and have independent realizations 

~ CAf{d, S) is used to denote that x is a complex Gaussian random vector with mean E{x} — d and covariance E{{x — d)(x — 
d)t} = S 

""Note that in the channel model l[T}, y, x, and n are column vectors. 
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for each block. It is further assumed that neither the transmitter nor the receiver has prior knowledge of the 
realizations of the fading coefficients. 



III. Training-Based Transmission and Reception 

We assume that pilot symbols are employed in the system to facilitate chaimel estimation at the receiver. 
Hence, the system operates in two phases, namely training and data transmission. In the training phase, pilot 
symbols known at the receiver are sent from the transmitter and the received signal is 

yt = hy^t + rvt (2) 

where yt, xj, and are Z-dimensional vectors signifying the fact that I out of m input symbols are devoted to 
training. It is assumed that the receiver employs minimum mean-square error (MMSE) estimation to obtain the 
estimate 

^ = ^{%.} = ^.|l4.\^„ ^!y.. (3) 

With this estimate, the fading coefficient can now be expressed as 

h = h + h (4) 

where 

h^CM(o, a/'l?!'., 1 and h^mU 2||^iifl.r )- (5) 
V 7^||xt||^ + iVo/ V T\\^t\r + NoJ 

Note that h denotes the error in the channel estimate. Following the training phase, the transmitter sends the 
(m— 0-dimensional data vector x^, and the receiver equipped with the knowledge of the channel estimate operates 
on the received signal 

Yd = h-Kd + h-Kd + rid (6) 

to recover the transmitted information. We note that since training-based schemes are studied in this paper, 
memoryless fading chaimels in which m = 1 are not considered, and it is assumed throughout the paper that the 
block length satisfies m > 2. 

IV. Achievable Rates and Energy Efficiency in the Worst Case Scenario 
A. Average Power Limited Case 

In this section, we assume that the input is subject to an average power constraint 

E{\\xf}<mP. (7) 
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Our overall goal is to identify the bit energy values that can be attained with optimized training parameters such 
as the power and duration of pilot symbols. The least amount of energy required to send one information bit 



reliably is given b} 

^ SNR 

A^O C(SNR) ^ ^ 

where C(snr) is the channel capacity in bits/symbol. In this section, we follow the general approach in the 
literature and consider a lower bound on the channel capacity by assuming that 

z = /iXrf + rirf (9) 

is a Gaussian noise vector that has a covariance of 

S{zzt} = CT?^{xrfX^} + NqI, (10) 

and is uncorrelated with the input signal x^. With this assumption, the channel model becomes 

yd = /ixrf + z. (11) 

This model is called the worst-case scenario since the channel estimate is assumed to be perfect, and the noise 
is modeled as Gaussian, which presents the worst case [23]. The capacity of the channel in (fTTI) . which acts as 
a lower bound on the capacity of the channel in is achieved by a Gaussian input with 

E{x,xt, = (1^1 (12) 

a m — I 

where 6* is the optimal fraction of power allocated to the pilot symbol, i.e., jxfp = d*mP. The optimal value 
is given by 



S* = ^/rj{ij + 1) - T] (13) 

where 

mSNR+(m-l) , ■j'^P 

V = ; ^ and SNR = (14) 

m{m - 2)SNR No 

Note that SNR in ([141 ) is the received signal-to-noise ratio. In the average power limited case, sending a single 

pilot is optimal because instead of increasing the number of pilot symbols, a single pilot with higher power can 

be used and a decrease in the duration of the data transmission can be avoided. Hence, the optimal x^ is an 

''Note that ^ is the bit energy normalized by the noise power spectral level A^'o. 
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(m — l)-dimensional Gaussian vector. Since the above results are indeed special cases of those in [23], the details 

,1 



are omitted. The resulting capacity expressiorO is 



m-1 _ (, . (?!)(SNR)SNR- , ,2 



>2 



Cl(snr) = <^ log 1 + ; — -, -t\w 

^ ^ m i V ?/'(SNR)SNR+ (m-1) 

= —^Euj {log (1 + /(SNR)|u;|^)} nats/symbol (15) 

where 

(/.(SNR) = 5*(l-(5*)m2, and V(snr) = (1 + (m - 2)5*)m, (16) 

and w ~ CM{0, 1). Note also that the expectation in ([TSll is with respect to the random variable w. The bit 
energy values in this setting are given by 

Eb,U SNR 



iVo Cl{SNR] 



log 2 (17) 



where Cl is in nats/symbol. provides the least amount of normalized bit energy values in the worst-case 
scenario and also serves as an upper bound on the achievable bit energy levels of channel It is shown in [12] 
that if the channel estimate is assumed to be perfect, and Gaussian codebooks designed for known channels are 
used, and scaled nearest neighbor decoding is employed at the receiver, then the generalized mutual information 
has an expression similar to (1151 ) (see [12, Corollary 3.0.1]). Hence also gives a good indication of the 
energy requirements of a system operating in this fashion. The next result provides the asymptotic behavior of 
the bit energy as SNR decreases to zero. 

Proposition 1: The normalized bit energy ([TT] ) grows without bound as the signal-to-noise ratio decreases to 
zero, i.e.. 



Ehu 



SNR log 2 
= lim — — -log2 = ^ = oo. (18) 

C^^O SNR^O Cl (SNR) Cl(0) 



Proof: In the low SNR regime, we have 

Cl(SNR) = (/(SNR)i5;{|z/;|2} + o(/(SNR))) (19) 

m 

= ^^(/(SNR) + o(/(SNR))). (20) 
m 

As SNR 0, 6* ^ 1/2, and hence (/)(snr) and ?/;(snr) —>■ m + m{m - 2)/2. Therefore, it can easily 

be seen that 

2 

/(SNR) = /" SNR^ + o(SNr2) (21) 
4(m — 1) 

from which we have Cl{0) = 0. □ 
""Unless specified otherwise, all logarithms are to the base e. 
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Fig. 1. Energy per bit Eb,u/No vs. SNR in the worst-case scenario 



The fact that Cl decreases as SNR^ as SNR goes to zero has already been pointed out in [23]. The reason for 
this behavior is that as SNR decreases, the power of /i (IS) decreases Unearly with SNR and hence the quality of the 
channel estimate deteriorates. Since the channel estimate is assumed to be perfect, the effective signal-to-noise 
ratio decays as SNR^ leading to the observed result. Proposition \T\ shows the impact of this behavior on the 
energy-per-bit, and indicates that it is extremely energy-inefficient to operate at very low SNR values. The result 
holds regardless of the size of the block length m as long as it is finite. We further conclude that in a training- 
based scheme where the channel estimate is assumed to be perfect, the minimum energy per bit is achieved at 
a nonzero SNR value. This most energy-efficient operating point can be obtained by numerical analysis. We can 
easily compute Cl(SNR) in ( fTSl ). and hence the bit energy values. 

Figure [Jplots the normalized bit energy curves as a function of SNR for block lengths of m = 3, 5, 10, 20, 50, 100, 
200, 10'^. As predicted, for each block length value, the minimum bit energy is achieved at nonzero SNR, and the 
bit energy requirement increases as SNR —>■ 0. It is been noted in [23] that training-based schemes, which assume 
the channel estimate to be perfect, perform poorly at very low SNR values, and the exact transition point below 
which one should not operate in this fashion is deemed as not clear. Here, we propose the SNR level at which the 
minimum bit energy is achieved as a transition point since operating below this point results in higher bit energy 
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requirements. It is further seen in Fig. [T] that the minimum bit energy is attained at an SNR value that satisfies 

d ( Eb^\ ^ _±_ ( SNR log 2 \ ^ ^ 

Another observation from Fig. [T] is that the minimum bit energy decreases with increasing m and is achieved at 
a lower SNR value. The following result sheds a light on the asymptotic behavior of the capacity as m ^ cxd. 

Theorem 1: As the block length m increases, Cl approaches to the capacity of the perfectly known channel, 
i.e., 

lim Cl(snr) = ^;^{log(l + SNR|u;|2)}. (23) 

m—>oo 

Moreover, define x = Then 

dCL(SNR 



dx 

Proof: We have 



= -oo. (24) 

x=o 



lim C7l(snr) = lim {log (l + /(snr)|u;|2) | (25) 

m— >oo m^oo ' 

= E^{ lim log (1 + /(SNR)|u;|2) | (26) 

= (log (l + lim /(SNR)) I (27) 

= ^,„{log(l + SNR|u;|2)}. (28) 

(|25] ) follows from the fact that (m — l)/m ^ 1 as m ^ oo. For (|26l ) to hold, we invoke the Dominated 
Convergence Theorem [40]. Note that 

|log(l+/(SNR)|u;p)| < /(SNR)|u'|2 (29) 



(snr)snr2 



(30) 

MNK -|- — IJ 

< TT (-SNR|u;|2 (31) 



'0(SNR)SNR + (m - 1) 
(/)(SNR) 
-i/'(SNR) 



^ ' SNR|u;p (32) 



m + m(m — 2)5* 



P + m(m - 2) 



SNrI-u;]^ (33) 



< ^SNR|u;|2 (34) 



m — 2 

|2 



< 3SNR|wp for m > 3 (35) 

where (l34l) is obtained by removing jt in the denominator and (1351 ) follows from the facts that 1 — 5* < 1 and 

< 3 for all m > 3. If m = 2, we have 0(snr) = 1, V(snr) = 2, and hence 



m-2 



2 



SNR' 

|log(l+/(SNR)|u;|^)| = log ( 1 + 2sNR+l ''^'') - 2^^'^''"l'- ^^^^ 
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Fig. 2. Minimum energy per bit -fy^ vs. in tlie worst-case scenario 



Therefore, 3SNR|t(;p is an upper bound that applies for all integer values m > 2. Furthermore, the upper bound 
does not depend on m and is integrable, i.e., £'^{3SNR|t(;p} = 3SNR < oo. Hence, the Dominated Convergence 
Theorem applies and ( [261 ) is justified. ( |27] ) is due to the fact that logarithm is a continuous function. ( [28] ) can 
easily be verified by noting that is the fastest growing component, increasing as with increasing m. 

([24] ) follows again from the application of the Dominated Convergence Theorem and the fact that the derivative 
of /(snr) with respect to x = 1/""^ at x = is — cxd. □ 

The first part of Theorem [T] is not surprising and is expected because reference [5] has already shown that as the 
block length grows, the perfect knowledge capacity is achieved even if no channel estimation is performed. This 
result agrees with our observation in Fig. [T]that —1.59 dB is approached at lower SNR values as m increases. 
However, the rate of approach is very slow in terms of the block size, as proven in the second part of Theorem 
[T] and evidenced in Fig. |2l Due to the infinite slopqj observed in the figure, approaching —1.59 dB is very 
demanding in block length. 



'Note thiat Theorem [Tj implies thiat tiie slope of ^ ^(g^R) ''t X = = is oo 
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B. Peak Power Constraint on the Pilot 

Heretofore, we have assumed that there are no peak power constraints imposed on either the data or pilot 
symbols. Recall that the power of the pilot symbol is given by 



\xt 



2 



6*mP = + mP) - C (37) 



where ^ = "^"^ {m-^yy^'^^° ■ immediately observe from (|37] ) that the pilot power increases at least as ^/m as 
m increases. For large block sizes, such an increase in the pilot power may be prohibitive in practical systems. 
Therefore, it is of interest to impose a peak power constraint on the pilot in the following form: 

|xt|^ < kP. (38) 

Since the average power is uniformly distributed over the data symbols, the average power of a data symbol is 
proportional to P and is at most (1 — 6*)2P for any block size. Therefore, k can be seen as a limitation on the 
peak-to-average power ratio. Note that we will allow Gaussian signaling for data transmission. Hence, there are 
no hard peak power limitations on data signals. This approach will enable us to work with a closed-form capacity 
expression. Although Gaussian signals can theoretically assume large values, the probability of such values is 
decreasing exponentially. The case in which a peak power constraint is imposed on both the training and data 
symbols is treated in the Section jV] 

If the optimal power allocated to a single pilot exceeds kP, i.e., 5*mP > kP =^ 5*m > k, the peak power 
constraint on the pilot becomes active. In this case, more than just a single pilot may be needed for optimal 
performance. 

In this section, we address the optimization of the number of pilot symbols when each pilot symbol has fixed 
power Ixj^jp = kP If the number of pilot symbols is I < m, then ||xf|p = IkP and, as we know from 
Section |IIll 

V -f^lKP + NoJ V 7^^^^ + ^0 / 

Similarly as before, when the estimate error is assumed to be another source of additive noise and overall additive 
noise is assumed to be Gaussian, the input-output mutual information achieved by Gaussian signaling is given 
by 

Il,p='^^E^ {log {l + g{SNRj)\w\^)} (39) 



where w ~ CM{0, 1) and 



5(SNR,Z =- } -f- -. (40) 

{m - Ik + {m - 1)Ik)snr + m - I 



The optimal value of the training duration / that maximizes Il,p can be obtained through numerical optimization. 
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SNR 

Fig. 3. Energy per bit Eb,u/No vs. SNR for block sizes of m — 50, 100, 200, 500, 10"^, 10'*. The pilot peak power constraint is 
|xtp < lOP. 

Fig. [3] plots the normalized bit energy values ^^^'"s ^ in dB obtained with optimal training duration for different 
block lengths. The peak power constraint imposed on a pilot symbol is < lOP. Fig. |4] gives the optimal 
number of pilot symbols per block. From Fig. [3j we observe that the minimum bit energy, which is again achieved 
at a nonzero value of the SNR, decreases with increasing block length and approaches to the fundamental limit of 
— 1.59 dB. We note from Fig. |4]that the number pilot symbols per block increases as the block length increases 
or as SNR decreases to zero. When there are no peak constraints, 6* — > 1/2 as SNR 0. Hence, we need to 
allocate approximately half of the available total power mP to the single pilot signal in the low-power regime, 
increasing the peak-to-average power ratio. In the limited peak power case, this requirement is translated to the 
requirement of more pilot symbols per block at low SNR values. 

Table |I] lists, for different values of m, the minimum bit energy values, the required number of pilot symbols 
at this level, and the SNR at which minimum bit energy is achieved. It is again assumed that k = 10. The last 
column of the table provides the minimum bit energy attained when there are no peak power constraints on the 
pilot signal. As the block size increases, the minimum bit energy is achieved at a lower SNR value while a longer 
training duration is required. Furthermore, comparison with the last column indicates that the loss in minimum 
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Fig. 4. Number of pilot symbols per block vs. SNR 



bit energy incurred by the presence of peak power constraints is negligible. The following result shows that the 
capacity of the perfectly known channel, and hence the minimum bit energy of — 1.59dB, is approached with 
simultaneous growth of training duration and block length. Note that this result conforms with the results in 
Table H 

Proposition 2: Assume that the training duration Z(m,SNR) increases as m increases and satisfies 



/(m.SNR) 
lim -^^ ^ = 0. 



Then, lim^^^oo Il,p 
Proof: We have 



2^ 



m 



£;^{log(l + SNrIu;^)}. 
lim I L,p= lim ( 1 - — ] {log (l + g{SNR,l)\w\'^)] 

m—>oo m-^oo \ Ul J 

= lim E^{log(l+5(SNR,/)|u'p)} 
= E^\ lim log(l+5(SNR,/)|u;n| 

Lm— >oo J 

= S^{log(l + SNR|w;|^)}. 



(41) 



(42) 
(43) 
(44) 
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TABLE I 





^ (dB) 


# of pilots 


SNR 


—^r^ (dB) (no peak constraints) 

™0 nun 


m = 50 


1.441 


1 


0. 41 


1.440 


m = 100 


0.897 


2 


0.28 


0.871 


m = 200 


0.413 


3 


0.22 


0.404 


m = 500 


-0.079 


5 


0.16 


- 0.085 


m = 10^ 


-0.375 


9 


0.12 


-0.378 


m = lO"* 


-1.007 


44 


0.05 


-1.008 



(|42)) follows from the condition ((4T]) . (1431 ) can be justified by invoking the Dominated Convergence Theorem 
[40] similarly as in the proof of Theorem [T] (l44l ) follows from 

lim ^(SNR, /) = SNR, (45) 

which holds if the conditions of the theorem are met. □ 

C. Flash Training and Transmission 

One approach to improve the energy efficiency in the low SNR regime is to increase the peak power of the 
transmitted signals. This can be achieved by transmitting v fraction of the time with power P/v. Note that training 
also needs to be performed only v fraction of the time. In this section, no peak power constraints are imposed 
on pilot symbols. This type of training and communication, called flash training and transmission scheme, is 
analyzed in [11] where it is shown that the minimum bit energy of —1.59 dB can be achieved if the block length 
m increases at a certain rate as SNR decreases. In the setting we consider, flash transmission scheme achieves 
the following rate: 

C7^l(snr, u) = u{mK)CL {j^^^ (46) 

where < z^(-) < 1 is the duty cycle which in general is a function of the SNR. First, we show that flash 
transmission using peaky Gaussian signals does not improve the minimum bit energy. 
Proposition 3: For any duty cycle function 

SNR ^ . ^ SNR 
mf — > mf — -. (47) 

SNR C/L (SNR,zy) SNRCl(SNR) 
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Proof: Note that for any SNR and i/(snr), 

SNR 

SNR UJ^mj SNR ^ . „ SNR 

— 1 7 = 7 = , ~ s > mf ——, r (48) 

C/l(snr, z^) Cl ( (Inr) ) snrCl(snr) 
where SNR is defined as the new SNR level. Since the inequality in (l48l) holds for any SNR and i^(-), it also holds 
for the infimum of the left-hand side of (1481 ). and hence the result follows. □ 
We classify the duty cycle function into three categories: 

SNR 



1) !/(•) that satisfies limsNR^o i^nRJ = ^ 



2) z^(-) that satisfies limsNR^o 



oo 



3) z^(-) that satisfies limsNR^o ^(^^^^ = o- for some constant a > 0. 
Next, we analyze the performance of each category of duty cycle functions in the low-SNR regime. 
Theorem 2: If v{-) is chosen from either Category 1 or 2, 



No 

If z^(-) is chosen from Category 3, 



SNR 

= lim — — log 2 = oo. (49) 

Cf,=0 SNR^O C/l(SNR,I/) 



No 



CfL=0 



m a 
m-lE^{\og^{l + f{a)\w\')y 



(50) 



Proof: We first note that by Jensen's inequality, 



C,,(SNR, .) ^ KSNR) / ^ ^ / SNR ^^^^ 



SNR m SNR V V 1/{SNR 

def 



C(SNR,Z.). (52) 

First, we consider category 1. In this case, as SNR 0, ^(^g^) ~^ ^- shown before, the logarithm in (|5T| ) 
scales as ^^^^^^ as SNR — > 0, and hence C(SNR, v) scales as ^(^g^^) leading to 

^.^ C/L (SNR, ^ ^.^^ Ci&nK, v) = 0. (53) 
SNR^O SNR SNR^O 

In category 2, ^^^g^^^ grows to infinity as SNR 0. Since the log(-) function on the right hand side of (ISTI) 

SNR 

increases only logarithmically as ^(gj^jR^ oo, we can easily verify that 

g/L(SNR,z.) ^ C(SNR,i^)=0. (54) 

SNR^O SNR SNR^O 

In category 3, z^(snr) decreases at the same rate as SNR. In this case, we have 

^.^C,.(SNR,.)^ ^.^ 

SNR^O SNR n-*oo l 

n 

^ ^£^.{lim„_oolog(l + /(^)|u;p)} ^^^^ 



Hi^i?^{log(l + /(a)kP)} ^5^^ 
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Fig. 5. Energy per bit Eb,u/No vs. SNR for non-flashy and flash transmissions. 



(l56l) is justified by invoking the Dominated Convergence Theorem and noting the integrable upper bound 

1 3 
< 3 — < for n > 1. 

ni' V 

The above upper bound is given in the proof of Theorem [U Finally, (1571) follows from the continuity of the 
logarithm. □ 
Theorem |2] shows that if the rate of the decrease of the duty cycle is faster or slower than SNR as SNR 0, the 
bit energy requirement still increases without bound in the low-SNR regime. This observation is tightly linked to 
the fact that the capacity curve Cl has a zero slope as both SNR and SNR oo. For improved performance 
in the low-SNR regime, it is required that the duty cycle scale as SNR. A particularly good choice is 

i/(SNR) = — SNR 

a* 

where a* is equal to the SNR level at which the minimum bit energy is achieved in a non-flashy transmission 
scheme. With this choice, we basically perform time-sharing between SNR = and SNR = a*. Fig. |5] plots the 
normalized bit energy as a function of SNR for block size m = 10. The minimum bit energy is achieved at 
SNR = 0.8. For SNR < 0.8, flash transmission is employed with i/(snr) = 1/0.8 SNR. As observed in the figure, 
the minimum bit energy level can be maintained for lower values of SNR at the cost of increased peak-to-average 
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power ratio. It should be noted that the optimal point of operation is still at SNR = 0.8 since operating at SNR < 0.8 
will result in reduced data rates without any improvements in the bit energy. From a different perspective, if SNR 
is the signal-to-noise ratio per unit bandwidth, then increasing the bandwidth so that SNR < 0.8 will not produce 
any energy savings. However, in circumstances in which regulations or device properties dictate operation at 
SNR values lower than the minimum bit energy point, flash transmission can be adopted to improve the energy 
efficiency. 

V. Capacity and Energy Efficiency in the Presence of Peak Power Limitations 
In this section, we consider the chaimel 

Yd = hyid + hyid + (58) 

and assume that the chaimel input is subject to the following peak power constraint 

||x|p < mP. (59) 

In this setting, it is again easy to see that the transmission of a single pilot is optimal. Since the peak power 
constraint is imposed on the input vector x, the pilot power can be varied instead of increasing the number of 
pilot symbols. Similarly as before, we assume that the pilot symbol power is 

\xt\^ = dmP. (60) 

Therefore, the (m — l)-dimensional data vector is subject to 

W^df < (1 - S)mP. (61) 

Our goal is to solve the maximization problem 

C= sup sup —I{xd;yd\h) (62) 

<5e(o,i) .^^^-^ 

l|xd|P<(l-<5)mP 

and obtain the channel capacity, and identify the capacity-achieving input distribution and the optimal value of 
the power allocation coefficient 6. The input-output mutual information is 



where 



and 



// I f (y|X(i, h) 
f . ^(y|x,, h) log . dy (63) 

/y|ft(y|^) 

exp (-(y - /iXd)t(72x^xJ + NoiyHj - hyid)) 
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First, we have the following preliminary result on the structure of the capacity-achieving input distribution. 

Theorem 3: For the block fading channel (1581 ) where the input is subject to a peak power limitation (|6T]) . the 
capacity-achieving input vector can be written as = ||xd||v where ||x^|| is a nonnegative real random variable 
and V is an independent isotropically distributed unit random vector. 

Proof: The proof follows primarily from the same techniques developed in [5]. First note the invariance of the 
peak constraint (|6TI ) to rotations of the input. Since /y|x(<l>y|'5xrf, h) = fy^^^ ^ivl'^d: h) for any (m — 1) x (m— 1) 
dimensional deterministic unitary matrix <I>, it can be easily seen that the mutual information is also invariant to 
deterministic rotations of the input, and the result follows from the concavity of the mutual information which 
implies that there is no loss in optimality if one uses circularly symmetric input distributions. □ 
With this characterization, the problem has been reduced to the optimization of the input magnitude distribution, 
■ We first obtain an equivalent expression for the mutual information when the the input vector has the structure 
described in Theorem [3] 

Theorem 4: When the input is x^ = ||xrf||v where v is an isotropically distributed unit vector that is in- 
dependent of the magnitude ||xrf||, the input-output mutual information of the channel (l58l) can be expressed 



as 



I{^d;yd\h) = I{Fr\h) = -EK,r 

where 

fR\r,K{R\r, K) = 

and 



fRlrM^lr, K) log g{R, Fr,K)dR\- i?.{log(l + r')} - (m - 1) (66) 



e !+'■ 

(m-3)! l+r^ 



■^6 1+^ io ( — 1^,,.2 ] da m > 3 



T I 2VKRr 



(67) 



m 



giR,Fr,K) 



(m-2)! 



R 



m-2 



(68) 



In the above formulations, R 



M 



■, r 



7l|xd 



, and K = Furthermore, K. denotes the distribution function 



E{\h^\} 

7= 



^ ^j^^ ■ E\<^ j. denotes the expectation 



of r. K is an exponential random variable with mean E{K\ 
with respect to K and r. 
Proof: See Appendix lAl 

Note that the integral in the mutual information expression in ( [63l) is in general an 2{m — l)-fold integral. In 
(l66l) . this has been reduced to a double integral providing a significant simplification especially for numerical 
analysis. With this result, the channel capacity in nats per symbol can now be reformulated as 



C= sup Cs = sup sup — I{Fr\h) 
<5e(o,i) -JeCo,!) m 



(69) 
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where L = Z^J'^mP+No ■ Hence, the capacity is obtained through the optimal choices of the power allocation 
coefficient 5 and normalized input magnitude distribution F^. Since the inner maximization is over a continuous 
alphabet, the existence of the capacity-achieving distribution Ff is not guaranteed. Next, we prove the existence 
of a capacity-achieving input distribution and provide a sufficient and necessary condition for an input to be 
optimal. 

Theorem 5: Fix the value of 5 G (0, 1) and consider the inner maximization in (|69l ). There exists an input 
distribution that maximizes the mutual information I{Fr\h). Moreover, an input distribution F,. is capacity- 
achieving if and only if the following Kuhn-Tucker condition is satisfied: 

$(r) = 1^ fR\r,K{R\r, K) log g{R, Fr, K) + log(l + r^) + mCs + {m - 1) > Vr G [0, VI] 

(70) 

with equality at the points of increase of fJJ. In the above condition, Cs denotes the result of the inner 
maximization in (l69l ) . 
Proof: See Appendix |B] 

Having shown the existence of the capacity-achieving input distribution and a sufficient and necessary condition 
for an input distribution to be optimal, we turn our attention to the characterization of the optimal input. 

Theorem 6: Fix the value of 5 € (0, 1). The input distribution that maximizes the mutual information I(Fr\h) 
is discrete with a finite number of mass points 

Proof: The following upper bound is obtained in Appendix |Bl 



g{R, F, K) < (m - 2) e" ^^"^ (71) 



Using this upper bound, we have 

Ek |^7fl|r,K(^k, K)log g{R,Fr, K)di?| = FKi?ii|r,K{log giR,Fr, K)} (72) 

< log(m-2) - FKi^ifir.K { + EKERir,K {Vkr] (73) 

< log(m-2) - FkF^|,,k I + Ek {VK^En^,^^{R}] (74) 

< log(m-2) ^-p-^ 

+ Fk { VK^/{l + Ky +m-l^ . (75) 

(1731) follows from dlB, and ^ follows from the fact that E{VR} < ^/E{R}. Finally, dTSjl is obtained by 
noting that F^i^ k{^} = (1 + + m — 1. Note that the upper bound in (175] ). and hence the left-hand-side of 
(ITOl ). decreases to — oo as r ^ cxd due to the presence of — in the second term. 

^The set of points of increase of a distribution function F is {r : F{r — e) < F{r + e) Ve > 0}. 
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We prove the result by contradiction. Hence, we now assume that the optimal input distribution Fq has an 
infinite number of points of increase on a bounded interval. Next, we extend the <!>(•) in (TTOI i to the complex 
domain: 

= Ek fR\r,K{R\z, K) log g{R,Fr, K) + log(l + z^) + C + {m-l) (76) 

where z £ C and log is the principle branch of the logarithm. The Identity Theorem for analytic functions 
[41] states that if two functions are analytic in a region and if they coincide for an infinite number of distinct 
points having a limiting point, they are equal everywhere in that region. It is shown in Appendix ICl that ^{z) 
is analytic in a region V that includes the positive real line. By the above assumption on the optimal input 
distribution, ^{z) = for an infinite number of points having a limiting poinj^ in region P. Therefore, by the 
Identity Theorem, we should have <I>(r) = for all r > 0. Clearly, this is not possible from the upper bound 
in ( 1741 ) which diverges to — oo as r ^ oo. Hence, the optimal input cannot have an infinite number of points 
of increase on a bounded interval, from which we conclude that the optimal input distribution is discrete with a 
finite number of mass points. □ 
After the characterization of the discrete nature of the optimal input, the optimization problem in ( [69l ) can be 
solved using vector optimization techniques. Numerical results indicate that the optimal magnitude distribution 
Fj. has a single mass at the peak level r = \/Z for low-to-medium received peak SNR = ^^y^ levels. Hence, all the 
information is carried by the isotropically distributed directional unit vector. Therefore, information transmission 
is achieved by sending points on the surface of an (m — 1) -dimensional complex sphere with radius . Note 
that the mutual information (in nats per m symbols) achieved by having a single-mass at r = vT is 

Icm = -Ek fR\r,K{R\r = VI, K) log g{R, Fr, K) dR^ - log(l + L) - (m - 1). (77) 

Figure |6] plots the capacity values as a function of SNR for block lengths of m = 10, 20, 30 and 40. These capacity 
values are achieved with optimal power allocation. The optimal fractions of power allocated to the pilot symbol 
are plotted in Fig. |7] Note that for the range of SNR values considered in the figure, the optimal value of 6 is 
slightly smaller than 1/m and approaches 1/m as SNR tends to 0. This power allocation strategy is significantly 
different from that of the worst-case scenario in which 6* —>■ 1/2 with decreasing SNR. 

In the low-SNR regime, the tradeoff between spectral efficiency and energy per bit obtained from ^ = ^(^(g^R)^ 
is the key performance measure [14]. If we assume, without loss of generality, that one symbol occupies a Is x IHz 
time-frequency slot, then the maximum spectral efficiency is Q.{Ei)/Nq) = C(SNR) Iog2 e bits/s/Hz where we have 
assumed that C (SNR) is in nats/symbol. Fig. [8] plots the bit energy values as a function of the spectral efficiency. 
It is again observed that the minimum bit energy is achieved at a nonzero spectral efficiency and the required 

"^The Bolzano- Weierstrass Theorem [40] states that every bounded infinite set of real numbers has a limit point. 
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bit energy values grow without bound as SNR and hence the spectral efficiency is further decreased. Indeed, we 
can show the following result. 

Theorem 7: Assume that the normalized input magnitude distribution has a single mass and hence the mag- 
nitude is fixed at r = ^fL. For any value of 5 G (0, 1), the normalized bit energy required by this input grows 
without bound as the signal-to-noise ratio decreases to zero, i.e., 

Proof: Recall that L = ^"d SNR = Also, note that an expression for Icm is given in (TTTI ). By 

making a change of variables, we have the following equivalent expression: 

Icm = -Ek fjiir,K{R\^, K(5mSNR) log g{R, Fr, K(5mSNR) dR^ - log(l + L) - (m - 1) (79) 

where K is now an exponential random variable with E{K} = 1, and hence is independent of SNR. We can easily 
show that 



^. mSNR ^ ^ mlog2 

= hm - — -log2 = ^ — ^ = oo. (78) 

^_=o SNR^oI,m(SNR) 4^(0) 



d 

-g^fR\r,K{R\^, KJmSNR) 



SNR=0 



om— 2 Tjm—1 

-{l-5)m-^ -e-'^ + {l-5)m-- e"-^. (80) 

^ ^ (m-2)! ^ ^ (m-1)! 



Note that 



im — 2)! 

5(i?,F^,K(5mSNR) = ^ ^^J' fR\r,K{R\^, K(5mSNR). 



Using these facts, we can easily prove that 



= 0. (81) 

SNR=0 



9SNR ^ 

□ 

In the very low SNR regime, the channel estimate deteriorates and the performance approaches that of non- 
coherent Rayleigh block fading channels. As shown in [18], bit energy values required in these channels grow 
without bound as SNR — > and the same phenomenon is observed here as well. In the worst-case scenario treated 
in Section |IVl the performance deterioration at very low SNR levels is due to the fact that poor channel estimates 
are assumed to be perfect. In this section, similar observations are the result of the limitations on the peakedness 
of the signal. Nevertheless, designing the transmission and reception for channel in (1581 ) rather than that in (ITTI) 
leads to energy gains in the low-SNR regime. Fig. |9] provides a comparison of the bit energy values required in 
the worst case scenario and the scenario where peak power constraints are imposed and optimal signaling and 
decoding is employed. In the worst-case scenario, the channel estimate is assumed to be perfect and transmission 
and reception is designed for a known channel. This is obviously a poor assumption in the low-SNR regime and 
in Fig. |9] we observe bit energy gains of approximately 1.5 dB when optimal techniques are employed in the 
case of m = 10. Note that these gains are achieved when the input is subject to more stringent peak power 
constraints. From Fig. |9l we also conclude that in the low-SNR regime, the achievable rate expression in dTS] ) is 
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Fig. 8. Bit energy ^ vs. Spectral efficiency C ( ^ ) in pilot-assisted systems with block lengths m — 10, 20, 30 and 40. 



No 



a lower bound to the peak-power limited capacity of the channel in (|58] ). Note that ([TSl l will eventually exceed 
this capacity value at high SNR levels as it is obtained under less strict average power constraints. 

In training-based systems, certain fraction of time and power which otherwise will be used for data transmission 
is allocated to the pilot symbols to facilitate channel estimation. Hence, there is a potential for performance loss 
in terms of data rates. However, at the same time, the availability of channel estimates at the receiver tends to 
improve the performance. On the other hand, in noncoherent communications, there is no attempt for channel 
estimation and communication is performed over unknown channels. The analysis presented in this paper can be 
applied to noncoherent communications in a straightforward manner by choosing 6 = and replacing m in the 
equations by m + 1 as no time is allocated to pilot symbols. Hence, for instance, the discrete nature of the optimal 
input under peak power constraints can easily be shown for the noncoherent Rayleigh channel as well. However, 
the details of this analysis is omitted because the discreteness results are proven for noncoherent Rician fading 
channels in [18] and for more general noncoherent MIMO channels in [8]. Here, we present numerical results. 
Figures [TOl and [TT] compare the performances of training-based and noncoherent communication systems. In Fig. 
[TOl the bit energy values are plotted for both schemes when the block length is m = 20. It is observed that for 
this relatively small value of the block length, both schemes achieve almost the same minimum bit energy value, 
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Fig. 9. Bit energy ^ vs. Spectral efficiency C in the worst-case scenario and the scenario of optimal coding-decoding under 

input peak-power constraints. The block length is m = 10. 



and therefore, the training-based performance is surprisingly rather close to that of the noncoherent scheme even 
in the low-SNR regime. Fig. [TT] plots the capacity values as a function of the block length at SNR = 5 dB. Here, 
we also observe that the performance of training-based schemes comes very close to that of the noncoherent 
scheme. Therefore, if having the channel estimate reduces the complexity of the receiver and/or pilot signals are 
additionally used for timing and frequency-offset synchronization or channel equalization, training-based schemes 
can be preferred over noncoherent communications with small loss in data rates. 

A. Capacity with Ideal Interleaving and Per-symbol Peak Power Constraints 

Since most of the well-known codes are designed to correct errors that occur independently from the location 
of other errors [43], practical communication systems employ interleavers at the transmitters to gain protection 
against error bursts. Deinterleavers are used at the receiver to reverse the interleaving operation. In this section, 
we consider such systems and assume that ideal interleaving is used so that each data symbol experiences 
independent channel conditions. Pilot symbols are inserted periodically after the interleaver. We note that a pilot- 
assisted transmission with ideal interleaving is also studied in [24] and [25] where achievable rates are considered. 
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Fig. 10. Bit energy ^ vs. Spectral efficiency C ( ^ J for training-based and noncoherent communication systems wlien m = 20. 
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Fig. 11. Capacity (nats/symbol) vs. block length m for training-based and noncoherent communication systems. SNR = 5 dB. 
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Since interleaving breaks the channel correlation seen by the data symbols, channel memory can no longer be 
taken advantage of in the transmission. Hence, interleaving in general decreases the capacity. Therefore, the 
capacity results in this section can also be regarded as lower bounds on the capacity of a non-interleaved system. 
On the other hand, one advantage of interleaving is the simplification of signaling schemes. 

We continue considering the block fading channel model. Hence, the channel stays constant for a block of m 
symbols. However, after deinterleaving, the channel output can be expressed as 

yd,i = hiXd,i + hiXd,i + Hi z = 1,2,3... (82) 

Note that due to interleaving, each data symbol Xd.i is affected by independent and identically distributed fading 

^ _ a.s. 

coefficients hi = hi+hi. In this section, we consider per-symbol peak power constraints, \xi\ < P\/i. Therefore, 
the pilot symbol power is |xtp = P. Note that the use of more than one pilot may be optimal. The channel 
capacity in this setting is formulated as follows: 

vn — / 

C= sup sup I{xd]yd\h) (83) 

l<l<m m 

\Xci\^<P 

where I denotes the number of pilot symbols per m symbols, and 



The inner maximization in (1831 ) becomes a special case of the inner maximization in (l62l) when we reduce the 



dimensionality of the optimization problem in ([62] 



by choosing m = 2. Therefore, the results on the structure 
of the capacity-achieving input immediately apply to the setting we consider in this section. The optimal input 
has a uniformly distributed phase. With this characterization, the capacity is 

fn — / 

C= sup sup IiFr\h) (84) 

l<«<m Fr 



-<Vl 



where 



where 



I{Fr\h) = -i^K,r fR\r,K{R\r, K) log giR, Fr, K) - Er{\og{l + r^)} - 1 (85) 



fR\r,K{R\r, K) = -^I, y-j^ j , (86) 

l-OO 

g{R,Fr,K)= /«|,,K(i?|r-,K)a!F,, (87) 
Jo 

'Note that the input constraints, error variances, and the constants multiplying the mutual information expressions will be different in 
the specialized case of l l62b and in {83}. But, the general structures of the two optimization problems are the same. 
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and, R=\j£,r=^-^,K = %f = -^ff^, and L = ,XT+i = iSti- Note that K is an exponential 



in (l84l ) is discrete with a finite number of mass points. 

Next, we present numerical results. Fig. [12] plots, for different values of the block lengths, the capacity curves 
as a function of SNR for training-based schemes. We observe that the capacity values increase with the block 
length even though the channel in (l82l ) is memoryless. This performance gain should be attributed to the fact 
that the channel estimate improves with increasing block length. Fig. [12] also plots the capacity of the interleaved 
noncoherent communications in which no attempt is made to learn the channel. From the comparison of the 
capacity curves, we observe that training significantly enhances the data rates when data symbols are interleaved 
at the transmitter. In Fig. [13] bit energy curves as a function of the spectral efficiency are plotted. Again, we see 
that training-based schemes perform much better in terms of energy efficiency than the noncoherent scheme. In 
aU cases, the minimum bit energy is achieved at a nonzero spectral efficiency level below which one should not 
operate. The bit energy requirement increases without bound as spectral efficiency decreases to zero. When we 
compare Figs. [8] and [13] we note that while simplifying the system design, interleaving also incurs a penalty in 
energy efficiency. Finally, in Fig. [14] we provide the optimal resource allocations by plotting the optimal number 
of pilot symbols per block as a function of SNR for different block length values. We realize that optimal number 
of pilots tends to increase as SNR decreases and approaches m/2. Hence, as in Section ITV-A[ asymptotically half 
of the available power in each block should be allocated to the training symbols. 

B. Achievable Rates and Bit Energies of On-Off Keying 

In this section, we relax the input constraints and assume that the input is subject to an average power constraint 



We consider the channel model ( [58] ) where there is no interleaving. Akin to Section IIV-CI our goal is to obtain the 
attainable bit energy levels when signals with high peak-to-average power ratios are employed. As before, single 
pilot symbol with power |xtp = 6mP is used and hence the data vector is subject to E'{||x£;|p} < (1 — S)mP. 
The data vector is again assumed to have an isotropically distributed directional vector v, and hence = ||xrf||v. 
We further assume that the on-off keying is used for magnitude modulation and therefore 




S{||xf } < mP. 



(88) 




(89) 



27 




Fig. 12. Capacity (nats/symbol) vs. SNR for interleaved, training-based transmissions when block lengths are m = 10, 20, 30, 40 and 
50, and for interleaved noncoherent transmission over the unknown Rayleigh fading channel. 



where A is a fixed magnitude level that does not vary with the power P. In order to satisfy the average power 
constraint we should have 

A-'po = (1 - S)P ^po= ~ f ^ (90) 

Therefore, in this signaling scheme, the peak power of the transmitted data signal is kept constant while its 
probability vanishes as P ^ 0. Hence, while the peak power is fixed, the peak-to-average power ratio grows 
without bound as P — > 0. Similarly as before, we define r = ^fei^. With this definition, the distribution of r is 

/ i/^. 1.T with prob. po = 
with prob. 1 — Po 

We further define u = (^^gjjv^ which does not depend on P, and SNR = ^j^. Now, we can write 



r = < 



''o = ; ' 't. . with prob. po = ^rr" 

V5mSNR+i u ^^2) 

with prob. 1 — Po 
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Fig. 13. Bit energy ^ vs. Spectral efficiency C (^^^ for interleaved, training-based transmissions when block lengths are m = 
10, 20, 30, 40 and 50, and for interleaved noncoherent transmission over the unknown Rayleigh fading channel. 



For a given value of 6, the mutual information achieved by the isotropically distributed directional vector v and 
r whose distribution is given in ( |92l ) is 



'-ook 



-E^ /^|K(i2|K)log (^l^-^/^|K(i?|K)) -polog(l + r2) - (m - 1) (93) 



where /k|k(-R|K) = (1 - Po)/i?|r,K(^k = 0> K) +Po/R|r,K(^k = ''o, K) and /ij|r,K(^k, K) is given in ([67]). 

IVo" 



Note that K is an exponential random variable with mean ^{K} = = ,5mSNR. Next, we obtain the bit 



energy required for reliable communications with OOK as SNR 0. 

Theorem 9: Assume that the normalized input magnitude distribution is given by ( [92l ). For a given value of 
b G (0, 1), the normalized bit energy required by this input as P — > is 

Proof: As in the proof of Theorem |2l we apply a change of variables and express the mutual information as 
look = -^K |^"/K|K(i?|K<5mSNR)log (^i!|_ll:/^|K(i?|K5mSNR)^ -polog(l + r2) - (m-1) (95) 



mSNR m loff 2 loa; 2 

lini log 2 = — = (94) 

SNR^0/oofe(SNR) ^ /„„fe(0) (1 - 5) - ^ log(l + (1 - <5)mz.) • 
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Fig. 14. Number of pilot symbols per block vs. SNR for interleaved, training-based transmissions when block lengths are m = 
10, 20, 30, 40 and 50. 



where K is now an exponential random variable with mean i?{K} = 1. It can be easily seen that 



= -log(l + (l-(5)mz^). (96) 
SNR=0 ^ 



We can also show that 

d 



dSNR 



/fi|K(-R|K(5mSNR) 



= --/H|r,K(i?k = 0,K = 0) + -fn\r,K{R\^ = VO^^W^, K = 0). (97) 
SNR=0 ^ 

Using ( |97l ). we can prove that the derivative of the first term on the right-hand side of ( |95] ) with respect to SNR 
at SNR = is (1 — 6)m. Combining this result with (l96l ). we arrive to 

/oofc(O) = {l-5)m-^ log(l + (1 - 6)mi^) (98) 

which concludes the proof. □ 
Theorem |9] shows that unlike previously treated cases, reliable communications with OOK modulation with 
fixed peak power requires finite bit energy as P ^ 0. Hence, OOK provides significant improvements in energy 
efficiency in the low-SNR regime at the cost of high peak-to-average power ratio. Since v = j^^r^j^, we can also 
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express the asymptotic bit energy level as 

Eb.ook 



No 



(99) 



(1-5) (l-;;^log(l + m^)) 



It has been shown in [18] that, if noncoherent communications with no channel estimation is performed and the 

a.s. 

input is subject to £'{||xp} < mP and ||x|p < mA, then optimal signaUng requires the following bit energy 



value as P ^ 0: 

,noncoh 



No 



log 2 

^ (100) 

c=o 



We note that similar results for fading channels with memory are obtained in [21] through the analysis of capacity 
per unit cost. Comparing (l99l ) and (llOOl ). we find that training-based schemes suffer an energy penalty due to 
the presence of the term 1/(1 — 6) and this penalty vanishes if 5 — > 0. Therefore, if OOK with fixed power is 
employed, the power of the training symbols should be decreased to zero as P ^ to match the noncoherent 
performance. This power allocation policy is in stark contrast to the results in the previous sections. Note that as 
SNR decreases, data transmission occurs extremely infrequently. In such a case, performing channel estimation 
all the time for each m-block irrespective of whether or not data transmission takes place is not an good design 
choice. Hence, a gradual decrease in the power allocated to training should also be intuitively expected. We 

-1.59 dB. 



further remark that as m — > oo and (5 — > 0, ' 



Fig. [15] plots the bit energy levels as a function of spectral efficiency for training-based OOK with fixed peak 

a.s. 

power and for training-based optimal signaling under input peak power constraints in the form ||x|| < {l — 5)mP. 
In this figure, the block length is m = 10, and for OOK, = 1. As predicted, below the spectral efficiency 
of approximately 0.4 bits/s/Hz, OOK provides better energy efficiency. The bit energy requirements of OOK 
decreases as spectral efficiency decreases as opposed to the behavior presented in the peak-power-limited case. 
Numerical analysis have also shown that the fraction of power allocated to training, 6, in OOK decreases as SNR 
decreases, conforming with the discussion in the previous paragraph. 

VI. Conclusion 

In this paper, we have studied the energy efficiency and capacity of training-based communication schemes 
employed for the transmission of information over a-priori unknown Rayleigh block fading channels. We have 
initially considered the worst-case scenario in which the product of the estimate error and transmitted signal is 
assumed to be Gaussian noise. The capacity expression obtained under this assumption is a lower bound to the true 
capacity of the channel, and provides the achievable rates when the communication system is designed as if the 
channel estimate were perfect. We have investigated the bit energy levels required for reliable communications and 
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Fig. 15. Bit energy ^ vs. Spectral efficiency C for training-based OOK signaling and training-based optimal signaling under 

input peak power constraints. The block length is m = 10. 



quantified the penalty in energy efficiency incurred due to regarding the imperfect channel estimate as perfect in 
the low-SNR regime. We have shown that the bit energy requirements grow without bound as SNR regardless 
of the size of the block length m. Hence, the minimum bit energy is achieved at a nonzero SNR value below 
which one should not operate under the aforementioned assumptions. We have also shown that approaching the 
minimum bit energy level of —1.59 dB is extremely slow in terms of block length as m ^ cx). Similar results 
are obtained if peak power limitations are imposed on training symbols. We have also investigated flash training 
and transmission schemes to improve the energy efficiency at low SNR levels. We have shown that in order for 
the bit energy requirement not to grow as SNR — > 0, the duty cycle in flash transmission should vanish hnearly 
with decreasing SNR. 

Next, we have analyzed the capacity and energy efficiency of training-based schemes when the input is subject 
to peak power constraints. We have characterized that the capacity-achieving input has a discrete magnitude and 
an isotropically distributed unit directional vector. Using this characterization, we have obtained the capacity 
expressions, optimal training power allocations, and bit energy levels required for reliable communications. We 
have noted that at low SNRs, the optimal input magnitude is fixed at a constant level. Due to the presence of 
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the peak power constraints, the bit energy requirements are again shown to increase without bound as SNR 
0. However, we have seen that gains in energy efficiency are obtained when optimal signaling and decoding 
are employed. We have compared the performances of training-based and noncoherent transmission schemes. 
Although training-based schemes dedicate certain amount of time and power to training symbols and as a result 
are expected to suffer in terms of data rates, we have observed that the performance loss is small even at relatively 
small block lengths and small SNR levels. We have also considered the case in which interleaving used at the 
transmitter for protection against error bursts and per-symbol peak power constraints are imposed. We have 
obtained the channel capacity, optimal training duration, and analyzed the energy efficiency. In this case, training 
is shown to improve the performance with respect to noncoherent communications. We have also investigated the 
improvements in energy efficiency in the low-SNR regime if OOK with fixed peak power and vanishing duty cycle 
is employed at the transmitter. Finally, we note that this work has primarily focused on block fading channels. 
Recently, we in [33], [34] and [35] have considered more general fading processes with memory. Since the exact 
capacity is rather difficult to obtain in such cases, achievable rate expressions are analyzed, and subsequently 
energy efficiency and optimal resource allocations are studied. 
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Appendix 

A. Derivation of the Mutual Information Expression in Theorem |4] 
The input-output mutual information expression for channel dSSl ) is 



4l^^_^(y|x,, h) log dy (101) 

= -Ef^E^, I 4|,,,;,(y|x,, h) log 4|^(y|/.) dy - E^Alog{7r^-' N^-' e^-\n^,f + ATq))} 

(102) 

= -E-^E^, I 4|,^,^(y|xrf, A) log f^^f^{y\h) dy - i?,{log(l + r')} - log(7r™-iiVo™-i) - (m - 1) 

(103) 

Note that the second part of (11021 ) is the conditional differential entropy of y given and h. (11031 ) follows from 
the definition r = -^4=. The main difficulty is to simplify 

V -"0 

X(xrf,/i) = J 4|^^^^(y|x,,A)log4|^(y|/i)dy (104) 

which, in general, is an 2(171 — l)-fold integral. Note that 

fylhiy\h) = I 4|^^ ;^(y|x,,A)dF.,. (105) 

Using the facts that fy\^{^y\^:x.d, h) = fy^^^ /j(y|xd, h) and input has circular symmetry, we can easily see that 
for any fixed unitary matrix <I> 

/yih(^y|^) = /y|h(yl^) = /y|h(llylll^) (106) 

and hence 

X{'^^d,h) = x{xd,h) = xi\\M\,h). (107) 

Therefore, /y|/j(y|^) and x(xrf,/i) are circularly-symmetric functions depending only on ||y|| and ||xrf||, respec- 
tively. Noting that 

H^x.xt+W = ^-^^^|^|L_, 008) 

and defining x^ = ||xrf||v and y = ||y||w, we can, after some algebraic steps, rewrite the conditional density 
function in (l64l) as 



P^T. f-Ml _ , 7^||x,|P||y|P|wtvP 2||x,||||y|||fe|9?(e^";^wtv) ^ 
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where denotes the real part of the complex number z, and 0^ is the phase of h. The usefulness of ( 11091) 
comes from the property that the magnitude ||X(i|| and the directional unit vector v are separated. We know from 
Theorem [3] that v is isotropically distributed and independent of ||X(i||. Hence, we now have 

, P^n i-ML _ \^n^^? , 7^l|x.|P||y|P|wtvP 2||x.||||y|||fe| SR(e^''>^ w1 v) \ 

= J .--iAr--^(^lx,f + Aro) f^Wl..,- (HO) 

where /v is the probability density function of v. Since is a function of only ||y||, we can, without loss of 
generality, assume that -w^ = [1, 0, 0, ... , 0]. In such a case, 

, P^n (-Ml _ , 7^||x,|P||y|P|^i|^ 2||x,||||y|i|fe|SR(e^''ft^;i) \ 

# ^ f^'^I^l, iVo 7^|lx,|P+7Vo Afo(7^||x,|P+Af„) "^^ 7^||x,|P+iVo ^ 

= y .»^-iAr--^(^2||,^p + ^.^) A(-iMi^||x,, (111) 

where vi is the first component of v and /^^ is the corresponding density function. From [5], we have for m > 3 

/..(^i) = 7^2(m-2)(l-|z;i|2r-3 |^;,| < 1. (112) 
zvr 

Hence, vi has a uniform phase and a magnitude whose density function is 

/|,^|(|t;i|)=2(m-2)|t;i|(l-|z;i|2r-3. (113) 

Note that if m = 2, then is one-dimensional and hence x^ = ||xrf||v = ||xrf||e^='d. Therefore, in this case, 

Wi R _ llylP w — ih. 



v| = |fi| = 1 with probability one. Using these facts and defining r = ^^fe^, R = Hr-, K = ■'M-, and a = 



we obtain 



fyidy\h) 



r dFr Jr^t /o d - -r-^e^ h (g^^) da m>3 



5(i?,F,,K) 



where (/(i?, -Fr, K) is defined in (1681 ). Therefore, we have 



(114) 



(115) 



Xi^d,h) = j fy^^^f^{y\xd,h)logfy^f^{y\h)dy (116) 

= ^y|x.,/.{log/y|/.(y|^)rfy} (117) 

= gfi|,,K{log(^ ^^,,^^^_l J} (118) 

= -log(^"^-iiVo™-i) +i?^|,,K{log5(i^,i^r,K)} (119) 



rca 



poo 

= - log(^™-iiVo™-i) + / fRir,KiR\r, K) loggiR, Fr, K) di? (120) 

Jo 

where /ij|r,K(^k) K) is the conditional density function of R given r and K. Combining (11031) and (11201 ). we get 

/(xd; yd|/i) = -EK,r /fl|r,K(^k, K) log giR, Fr, K) - i?.{log(l + r^)} - (m - 1) (121) 
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which is the mutual information expression provided in Theorem ID Proof will be completed by showing that 
k(-R|?', K) has the expression given in (l67l ). From the previous development, we can easily verify that 



/i?i,...,R„-i|r,K(-Rl: • • • J-^m-lk, K) — < 



(m-2)e 1+'- 

_ R+Kr^ 



\m—3 



e /o ( — 1_|_^.2 I da m>3 

m = 2 



(122) 



where /iji,...,ij„_i|r,K is the conditional joint density function of Ri, . . . , Rm-i given r, K. Note that we above 



have defined Ri 



and hence R = '^j^ = Ri + R2 + ■ ■ ■ + Rm-i- Note that the joint probability density 



function depends on the sum R. We have the following relationship 

J fRiR\r,K)dR = j fR^_R_^\,^y^{Ri,...,Rm-i\r,\<)dRi...dRm-i 

-I 



(123) 
(124) 



/ 
/ 



dR2...dRm-l I /i?i,...,iJ„_i|r,K(^k, K)(ii? 

R2 + ... + R,„-l 

fR.,...,R^..\r,Mr,K)dR I dR2 (125) 







di?3 . . . dRm-1 

'R3+-+R 

00 

dRs... dRm-1 I {R-{R3 + ... + Rm-l))fR„...,R^^,\r,Mr,K)dR 



Rs+.-.+Rr, 



00 



r/Ki,...,i?„-i|r,K(^k, \<)dR. 



(126) 
(127) 



(11241 ) follows by applying the change of variables with R = Ri + R2 + . . . + Rm-i in the integral with respect 
to Ri. (11251 ) is obtained by interchanging the integrals with respect to R2 and R. (11261 ) follows by evaluating the 
rightmost integral in (I125I ). Finally, ( 11271 ) is obtained through the repeated application of this procedure. From 
(11271 ). we have 



E>m— 2 

fR{R\r, K) = ^ ,„ /fi„...,R_,|.,K(i^|r,K) 



m - 2)!- 

i^:, /o (1 - -r-'e-^ lo (^^^) da m>3 

m = 2 



I \i+r^) ^0 [hr^ ' d,a 



(128) 



(129) 



which is the same as the expression in ([67 



B. Proof of Theorem \5\ 

1 ) Existence of the Capacity-Achieving Input Distribution: An optimal distribution exists if the space of input 
distribution functions over which the minimization is performed is compact, and the objective functional is weak* 
continuous [39]. The compactness of the space of input distributions with second moment constraints is shown 
in [4]. The compactness for the more stringent case of peak limited inputs follows immediately from this result. 
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Therefore, we need only to show the weak* continuity of I{-\h). The weak* continuity of the functional I{-\h) 
is equivalent to 



Fn"^ F ^ I{Fn\h) ^ I{F\h). (130) 
We first note the upper bound 



which is obtained from the bound (1 - a)'^-^e'^ Iq ( ^"^/^^ ^ < Iq (^i^) Va G [0, 1]. The upper 
bound in (11311 ) is bounded for all r G [0, \/L] and also for all i? > due to the exponential decrease in R in 
the second term. Since fji{R\r, K) and log(l + r^) are continuous and bounded functions for all r G [0, \/L] and 
> 0, by the definition of weak convergence [39], 

rco POO 

F^%F^I log(l +r2)dF„(r) ^ / log(l + r^) dF(r) (132) 

Jo Jo 

and 

Fn'^F^ fR{R\r,K)dFnir)^ fR{R\r,K)dF{r) (133) 

Jo Jo 

for all R> 0. Therefore, we have 

Fn'^F^g{R,Fn,r)^g{R,F,r) Vi? > 0. (134) 



Note that the mutual information in (1661 ) can also be written as 



roo rco nm—z 

I{Fr\h) = - dKMiK) dR- —g{R,Fr,\i)\ogg{R,Fr,K)-Er{\og{l + r'')}-{m-l) (135) 

Jo Jo (m-2)! 

The weak* continuity of the second term on the right-hand-side of ( 11351 ) follows from (I132I ). In order to show 
(11301 ) and hence the weak* continuity of the mutual information, we need to prove 

rco rco Tjm—2 

lim / dK/K(K) / dR- —g{R,Fn,K)logg{R,Fn,K) (136) 

u^ooJq Jq (m-2)! 

rco rco T>m—2 

= lima!K/K(K)/ dR- — g{R, Fn,K)log g{R, Fn,K) (137) 

Jq n-^co (m-2)! 

rco rco T>m—2 

= dKMK) lim dR- —g{R,Fn,K)logg{R,Fn,K) (138) 

Jq Jq n-^co (m-2)! 

rco rco om— 2 

= / dKUK) dR- —giR,F,K)loggiR,F,K) (139) 

Jo Jo (m-2)! 

( 11391 ) follows from (11341 ) and the continuity of the function x log x. In order to justify the interchanges of the 
limit and integral in (11371 ) and (11381 ). we invoke the Dominated Convergence Theorem [40] which requires an 
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integrable upper bound on the integrand. We first find the following upper bound on the function g: 



g{R,Fn,K) < (m-2) 



r^/L 



e 

(1+72 



-In 



2VKRr 



1 + 



dFnir) 



(140) 



/o 



(l+r2) 



dFJr) 



/kr 



(141) 

(142) 
(143) 



< (m — 2)e 1+^- 

_ R 

< (m — 2) e 1+^ 
= ^(ii, K) Vn, Vi?, K > 0. 

(11401 ) follows from the upper bound in (I131I ). (|141l) is obtained by noting that e~ 1+^ < e~~ for all r G [0, 
and > 0, and Jq (^1^^) < /o(\/K^) < Vi?, r > 0. Finally, (fT42l) follows from the observation that 

the integrand in (11411 ) is less than 1 Vr, K > 0. Note that the upper bound u{R, K) is not a function of and 
decreases exponentially in R for sufficiently large values of R. Next, we find the following upper bound: 



'1-2 



(m-2)! 



5(i?,F„,K)log g{R,Fn,\<) 



nm—2 

< (45°-'(^, Fn, K) + g^{R, F„, K)) 



(m-2)! 



< 



R 



m-2 



(m-2)! 



(4n°-^(i?,K) +n2(i?,K)) Vi?, K > 0. 



(144) 
(145) 



(|144l) follows from the fact that |xlog(x)| < 4x°-^ + for all x > 0, and (11451 ) follows from (I143I ). Note that 
the upper bound in (11451 ) does not depend on F„ and is integrable due to the exponential decay of u{R, K) in R 
for sufficiently large values of R. Applying the Dominated Convergence Theorem with the upper bound in ( 11451) 
justifies (1138b . We further consider 



/k(K)/ dR 



R 



'171—2 



' {m-2] 



-g{R,Fn,K)log g{R,Fn,K) 



< /k(K) / dR 



R 



1771—2 



(m-2)! 



\g{R, Fn,K) log g{R,Fn,K)\ 



(146) 



roo rim— 2 

</k(K) dR _ (4g°-9(fi, K) + g^{R, F„, K)) 



(147) 
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Note that /k(K) 



Em 



e ^f'<> where E{K} 



■y^SmP 
No ■ 



The integral of the upper bound u{R,K) with respect to 



R increases exponentially with K. Hence, we need to find a tighter upper bound. We have 



9iR,Fn,K) < (m-2) 

< (m - 2) 

< (m - 2) 



(1 +r2 



■R + Kr^-2VKJ?. 



2VKSr 

l + r2 



dFnir) 



(%/7?-N/Kr)2 

e 1+^- dFn{r) 



< < 



(m-2) < KL 

{m - 2)e ~ R>KL 



v{R, K) Vn, Vi?, K > 



(148) 

(149) 
(150) 

(151) 
(152) 



where ( 11491 ) follows from the fact that Io{x) < e^, and ( 11501 ) follows by choosing the largest value r = \/L in 
the denominator of the exponential function. (1151b is obtained by noting that — V^r)"^ is a nonnegative 
quadratic function of r, minimized at r = \J^- Hence, if L > ^, the minimum value of the quadratic function 
is zero. Otherwise, it is (a/R - a/FCL)^- From (1147b and (1152b . we have 



k(K)/ 
Jo 



oo 

dR ^ g{R, Fn, K) log g{R, F„, K) 



(m-2)! 



</i 



k(K) / 



oo pm— 2 



(m-2)! 



(4i;'^-^(i?, K) + v^{R, K)). 



(153) 



Note that the upper bound in (11531 ) is independent of F„. It can also be verified easily that this upper bound 
is integrable with respect to K due to the facts that /k decreases exponentially with K while the integral in the 
upper bound produces a result that is at most polynomial in K. Applying the Dominated Convergence Theorem 
with the integrable upper bound in (11531 ) justifies (11371 ). Hence, the proof is complete. 



2) Sufficient and Necessary Kuhn-Tucker Condition: The proof of the sufficient and necessary condition in 
(TTOI ) follows along the same lines as those in [4] and [16]. The weak derivative of I{-\h) at Fq is defined as 

+ (154) 

The weak derivative of the mutual information in (l66l) is obtained as 



lF^{F\h) = Ek^J dFoir) f^r^Mr, K) log g{R, Fq, K) dR 
- i^K jy dF{r) fR\r,K{R\r, K) log g{R, Fq, K) dR^ + j dFo{r) Iog(l + " / dFir) log(l + r^). 



(155) 
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Note that if Fq is indeed the maximizing distribution and hence capacity achieving, then l'p^{F\h) < for all F 
satisfying the peak power constraint. Then using the same steps in [4, Appendix II, Theorem 4], we can show 
that Fq is a capacity-achieving input distribution if and only if 

^k|^ /jj|,,K(^k,K)log5(i?,Fo,K)di?|+log(l + r2)+mC5 + m-l>0 Vr G [0, VI] (156) 

with equality at the points of increase of distribution Fq. 

C. Analyticity of the Kuhn-Tucker Condition in the Complex Domain 

We consider the following function which is the left-hand-side of the Kuhn-Tucker condition (TTOI ) in the 
complex domain: 

«>(z) = Ek|^ /R|,,K(^k,K)logg(i?,F„K)di?| +log(l + z)+mC75 + (m-l). (157) 

Note that log(l + z) is an analytic function of z = + jzi in the entire complex plane excluding the real axis 
with Zr < — 1 because the principle branch of the logarithm is not analytic only on the negative real line. Next, 
we investigate the region in which the first term of (11571 ) is analytic. We first note the Differentiation Lemma. 

Differentiation Lemma 1: [42, Sec. XII] Let / be an interval of real numbers, possibly infinite. Let U be an 
open set of complex numbers. Let / = f{t, z) be a continuous function on / x [/. Assume: 

(i) For each compact subset K oi U the integral Jj f{t, z) At is uniformly convergent for s ^ K. 

(ii) For each t the function z ^ f(t, z) is analytic. Let F{z) = f{t, z) dt. 

Then F is analytic on U and F' (z) = fj Df{t, z) dt where D is the differentiation operator. Furthermore Df{t, z) 
satisfies the same hypothesis as /. □ 
The integral f{t, z) dt is said to be uniformly convergent [42] for z G if, given e > 0, there exists 



< e. From this definition it can be easily shown that if 



Bq such that if Bq < Bi < B2, then 
Io° z)\dt < cxD, then f{t, z) dt is uniformly convergent. 
The function 

, ^Di L.^ e"^-^ [^,^ ,„_3 ^ , /2^/Ki?zV^\ ^^^^^ 

fn\r,M^, K) = ^--^ (1 - ar h [-^^ j (158) 

is analytic in the entire complex plane excluding the points at z = itj because rational functions are analytic 
everywhere except at the points that make the denominator zero; the exponential function and Jq are analytic 
everywhere because they can be expanded as power series; and if g and / are analytic then g o f \% also 
analytic in the corresponding region. The analyticity of the integral in (11581 ) can also be easily verified using the 
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Differentiation Lemma since the integration is over a finite interval. In order to find the region in which the the 
first term on the right-hand-side of (11571 ) is analytic, we need to find the region V that satisfies for all z G P 



We consider 

\fR\r,K{R\z,\<)\ 



R 



m-2 



/ /k(K) / |/H|,,K(^k,K)||log5(i?,Fr,K)|(ii?< 

Jo Jo 



oo. 



e i+- 



(m-3)! l + z2 7q 



(1 - ar-^e-^^^ 



1 + ^2 



da 



< 



R 



m-2 



e 1+-^ 



< 



(m-3)! |l + z2| 



{I -a) 



m— 3 



da 



(159) 



(160) 



(161) 



(m-3)! |l + z2 



da 



< 



(m-3)! |l + z2 



da 



< 



Rm-2 ,-«{^^} .1 



(m-3)! |l + z2 

e 



(1-a) 



da 



(m-2)! |l + z2 
i?"^-2 e~^' 



{tt^}! 



(m-2)! 



^m-2 g-K- 



1 + ^2 



(162) 
(163) 

(164) 

(165) 

(166) 



R(l + 2p-2?)-\/K - 



(m-2)! 



1 + ^2 



^ (l + ,2_,2)|i + ,2|2 



(167) 



m— 2 



-K 







(m-2)! 



1 + ^2 



2 4 \ 

z2(l + .2_^2)+_Iirf^ \ 



R 



m-2 



^H(l + z2_^2)_^ 



2 2, 

' ' ' — ' 



^r(l + ^;~^f)+2j! 



(168) 



(169) 



(m-2)! |l + z2| 

In the above formulations, 5R(2;) denotes the real value of the complex-valued number z = Zj. + jzi whose real 
and imaginary components are also denoted by and Zi, respectively. (1161b follows by taking the absolute 
value of the integrand instead of the absolute value of the integral. (11621 ) follows from the facts that (e^) = 
^{z) and \lo{z)\ < Io(K(z)). (11631 ) is due to Io{x) < e'^^l for a real number x. (11641 ) is obtained from the 
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bounds ^ ^^^^2^2^ which holds for all a G [0,1] and \zr\ > \zi\, and g^^'^^^v^j^j }| < 

g I \TTi^/| \/q g [0,1]. (1165b follows by evaluating the integral in (1164b . in which the only term that 
depends on a is (1 — a)™"^. (11661 ) is obtained by explicitly expressing 3? | j^^j and l^rpp-j in terms of Zr 
and Zi, the real and imaginary components of z. (11671 ) follows by expressing the exponents of the second and 
third exponential functions as a quadratic function of Eventually, ( 11691 ) is obtained from straightforward 
algebraic computations. 

The following lower bound on g{R,Fr, K) can easily be verified by noting that > 1 and Io{x) > 1 for all 
X > 0: 



L — — 

3(i?,F„K) > e-^ / ^il^dF, >e-^^-^. (170) 

From the above lower bound, we see that | log (7(i?, F^, K)| increases at most linearly in both R and K for 
sufficiently large values of R and K. Therefore, if {1 + z^. — zf) > 0, then the upper bound in (11691 ) decreases 
exponentially in R, and as a result, the inner integral in ( 11591 ) converges. This condition is satisfied in the region 
where \zr\ > \zi\. 

Note that the upper bound in ( 1169b increases exponentially in K. However, the value of the function 



zf{l+z^-zf) + 



c{Zr,Zi) = — 2T2 (1^1) 

'1 + Z^l^ 



can be made arbitrarily small by choosing arbitrarily small values for |zj|. Note also that /k(K) = -g^j^e ^{k} 
where E{K} = Hence, in the region where c{zr, Zi) < —J^^^, we have the integrand in ( |159b exponen- 

tially decreasing in K and as a result the integral converges, it can be shown that for a fixed \zi\ < 1, c{zr, Zi) is 
a monotonically decreasing function of > achieving its maximum of at at Zj. = 0. Hence, we consider 
the following region in the complex domain: 



V = J {zr, Zi):Q<Zr< mill { J ^^^^ j and \z,\ < 



this region includes the positive real line. 



In region TD, c{z,f.,Zi) < n^^p and \zr\ ^ I-Zji. Hence, the integral in ( 11591 ) converges in this region. Moreover, 
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